EMT is using an interpreter for its top level language. This is not fast. EMT is only fast if the underlying vectorization is used.
For an example of this effect, let us write a loop which counts how many integers of the form
are divisible by 7.
>function test1 () ... res=0; for i=1 to 1000 for j=1 to 1000 if mod(i^2+j^2,7)==0 then res=res+1; endif; end end return res; endfunction
>tic; test1(), toc;
20164 Used 1.971 seconds
It is possible to speed this up using vectorization. This trick avoids the interpreted language and relies on the underlying C code to do the loops.
- Generate a matrix M[i,j]=i^2+j^2. - Flatten the matrix to a vector. - Apply the "mod" function to the vector. - Sum the vector and count the ones.
The matrix generation is done by the matrix language of EMT.
>function test2 () ... M=(1:1000)^2+((1:1000)^2)'; return sum(mod(flatten(M),7)==0) endfunction
>tic; test2(), toc;
20164 Used 0.029 seconds
Typically, we are now 50 times faster.
But this kind of tricks make EMT and Matlab hard to read languages. It might be better to use a low level language for the work.
Before we use C, we make the user directory current, so that the compiled code of C can be stored.
>cd(eulerhome())
C:\Users\reneg\Euler
Then we can write pure C code to count. The last line "new_real(res)" is necessary to put the result on the EMT stack.
>function tinyc test3 () ... int res=0; for (int i=1; i<=1000; i++) for (int j=1; j<=1000; j++) if ((i*i+j*j) % 7 == 0) res++; new_real(res); endfunction
Typically, this is 5 times faster than the vectorized code above, and thus 250 times faster than the loop in the basic language.
>tic; test3(), toc;
20164 Used 0.007 seconds
We now look at an example where vectorization is not that easy to find.
For this, we count the number of prime numbers up to 100'000. EMT already has a function "isprime". It works for vectors. Thus we can use the following.
>tic; sum(isprime(1:100000)), toc;
9592 Used 0.75 seconds
The function "isprime" is partly vectorized as you can see in the line after "return". It works in the following way.
- Check n dividing by all numbers 2,3,5,... to sqrt(n). - Determine if any check was positive.
Of course, it would be nicer to check only up to a positive index.
>type isprime
function map isprime (n: integer) if n < 2 then return 0; elseif n==2 or n == 3 or n == 5 or n== 7then return 1; else if mod(n,2)==0 then return 0; endif; return !any(mod(n,3:2:sqrt(n))==0); endif endfunction
If we try to implement this without vectorization we end with a code similar to the following.
>function map emtisprime (n) ... if n<2 then return 0; endif; if n==2 then return 1; endif; if mod(n,2)==0 then return 0; endif; k=3; repeat while k*k<=n; if mod(n,k)==0 then return 0; endif; k=k+2; end return 1; endfunction
Let us first see if it works.
>n=1:100; n[nonzeros(emtisprime(n))]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
This code shows how inefficient the programming language is when it comes to loops. We lose the factor of 5 typically compared to the partly vectorized code.
>tic; sum(emtisprime(1:100000)), toc;
9592 Used 4.171 seconds
As a conclusion, it is much better to use the matrix language for vectorization. This is true even if we loop more than necessary.
Let us try a bit of C code. This is much harder to understand since we need to pass the argument from EMT to C using the C macro "ARG_DOUBLE". We also use the "CHECK" macro.
>cd(eulerhome())
C:\Users\reneg\Euler
>function tinyc cisprimesing (n) ... ARG_DOUBLE(n); int m=(int)n; CHECK(n>0 && n==m,"Need positive integer."); int res=0; if (m==2) res=1; else if (m%2==0) res=0; else if (m>2) { int k=3; res=1; while (k*k<=m) { if (m%k==0) { res=0; break; } k+=2; } } new_real(res); endfunction
Now, this function does not vectorize and works only for a single number. So we use another function which maps to the arguments.
>function map cisprime (v : index) := return cisprimesing(v) >n=1:100; n[nonzeros(cisprime(n))]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
>tic; sum(cisprime(1:100000)), toc;
9592 Used 0.194 seconds
Typically, we gain a factor of 20 to our self-made code, and a factor of 4 to the vectorized code in the EMT function isprime().
Let us see how much the vectorization of "map" in EMT costs. For that, we vectorize in C.
>function tinyc cisprimemaps (v) ... ARG_DOUBLE_MATRIX(x,r,c); RES_DOUBLE_MATRIX(y,r,c); int i; for (i=0; i<r*c; i++) { int res=0; int m=(int)(*x); if (m==2) res=1; else if (m%2==0) res=0; else if (m>2) { int k=3; res=1; while (k*k<=m) { if (m%k==0) { res=0; break; } k+=2; } } *y++=res; x++; } endfunction
We now gain another factor of 500 to even the best non-C code in EMT.
>tic; sum(cisprimemaps(1:100000)), toc;
9592 Used 0.009 seconds
Since C is hard to read and understand, EMT can use Python. The recent version can work with Python 3.8. If you have an older version for Python 2 the following timings my be different.
The code is now a direct translation of the simple loop that we used in the EMT language above.
>function python pyisprimesing (n) ... if n<2: return 0 if n==2: return 1 if n%2 == 0: return 0 k=3; while k*k <= n: if n%k == 0: return 0 k=k+2 return 1 endfunction
It does not vectorize. So we do that via "map".
>function map pyisprime (v : index) := return pyisprimesing(v) >tic; sum(pyisprime(1:100000)), toc;
9592 Used 0.412 seconds
That is about 10 times faster than the EMT syntax.
Note that we can now use the function "pyisprime" in Python too.
>>> print(pyisprimesing(11))
1
Thus we can easily vectorize the function in Python itself.
>function python pyisprimemaps (v : index) ... w=list(v) for i in range(len(v)): w[i]=pyisprimesing(v[i]) return w endfunction
This gains another factor of about 2.
>tic; sum(pyisprimemaps(1:100000)), toc;
9592 Used 0.105 seconds
But we still lose a factor of 10 compared to a vectorized version in C. Python is typically a lot less efficient that comparable C code.
The sieve of Eratosthenes is another example of a partial vectorization in EMT.
As you see in the code below, the EMT code for the sieve generates the multiples of prime numbers in each step and cancels them from the sieve using vectorized code. Note that the vector for the sieve contains only odd numbers which makes the understanding of the code a bit more complicated but makes it more efficient.
>type primes
function primes (n: integer) if n>=3 len = floor((n-1)/2); sieve = ones ([1,len]); for i=1 to (sqrt(n)-1)/2; if (sieve[i]) then sieve[3*i+1:2*i+1:len] = 0; .. endif end return [2, 1+2*nonzeros(sieve)]; elseif n>=2 return 2; else return []; endif endfunction
>primes(100)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
The sieve is nevertheless enormously fast.
>tic; length(primes(1000000)), toc;
78498 Used 0.015 seconds
To beat this you will need a really good C code.
The following is a simple translation of the sieve algorithm in C. For the work area, we get a part of the EMT stack. We store each number as a character which is 1=prime and 0=non-prime to save space.
>cd(eulerhome())
C:\Users\reneg\Euler
>function tinyc cprimes (n) ... ARG_DOUBLE(n); int m=(int)n; CHECK(n>1 && m==n,"Need an integer for cprimes"); char *v=getram(m); CHECK(v,"Could not allocate memory"); for (int i=2; i<m; i++) v[i]=1; for (int i=2; i*i<=m; i++) { if (v[i]) for (int j=2*i; j<m; j+=i) v[j]=0; } int count=0; for (int i=2; i<m; i++) count+=v[i]; header *hd=new_matrix(1,count); double *x=matrixof(hd); for (int i=2; i<m; i++) if (v[i]) *x++=i; moveresults(hd); endfunction
>cprimes(100)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
The result is not much faster than the EMT code.
>tic; length(cprimes(1000000)), toc;
78498 Used 0.014 seconds